Aarhus University
PhD Dissertation
Design and Analysis of Web Application Frameworks
Mathias Schwarz
Supervisor: Anders Møller
Submitted: January 29, 2013
Abstract
Numerous web application frameworks have been developed in recent years. These frameworks enable programmers to reuse common components and to avoid typical pitfalls in web application development. Although such frameworks help the pro- grammer to avoid many common errors, we find that there are important, common errors that remain unhandled by web application frameworks. Guided by a survey of common web application errors and of web application frameworks, we identify the need for techniques to help the programmer avoid HTML invalidity and security vulnerabilities, in particular client-state manipulation vulnerabilities. The hypothesis of this dissertation is that we can design frameworks and static analyses that aid the programmer to avoid such errors. First, we present the JWIG web application framework for writing secure and maintainable web applications. We discuss how this framework solves some of the common errors through an API that is designed to be safe by default. Second, we present a novel technique for checking HTML validity for output that is generated by web applications. Through string analysis, we approximate the out- put of web applications as context-free grammars. We model the HTML validation algorithm and the DTD language, and we generalize the validation algorithm to work for context-free grammars. Third, we present a novel technique for identifying client-state manipulation vulnerabilities. The technique uses a combination of output analysis and informa- tion flow analysis to detect flow in the web application that might be exploited by malicious clients. We implement and evaluate the techniques to study their usefulness in practice. We find that JWIG is useful for implementing large web applications. We further- more evaluate the static analyses techniques and find that they are able to detect real bugs with few false positives in open-source applications.
1
Resume
Igennem de senere ˚arer et stort antal web applikations-frameworks blevet udviklet. Disse frameworks gør det muligt for programmører at genbruge løsninger og undg˚a typiske fejl i web applikationer. Selvom s˚adanneframeworks hjælper programmører med at undg˚amange typer fejl, s˚aviser det sig, at der alligevel er typer af fejl, som endnu ikke bliver h˚andterettilstrækkeligt af frameworks. Vi undersøger hvilke typer fejl, der ofte opst˚ari web applikationer, og hvilke løsninger forskellige frameworks har p˚aproblermerne. Denne undersøgelse viser, at der er brug for teknikker, som kan hjælpe programmøren med at undg˚augyldig HTML og sikkerhedsproblemer, især s˚arbarhedersom skyldes manipulation af klien- tens tilstand. Hypotesen i denne afhandling er, at vi kan designe nye frameworks og statiske analyser som kan hjælpe programmøren med at undg˚adisse typer af fejl. Først præsenterer vi JWIG frameworket der kan bruges til at skrive sikre web applikationer som er lette af vedligeholde. Vi diskuterer hvordan dette framework løser nogle af de typiske problemer med et API som er designet til at undg˚afejl. Dernæst præsenterer vi en ny teknik til at verificerere korrekthed af genereret HTML i web applikationer. Vi bruger streng-analyse til at tilnærme det mulige output med kontekst-frie grammatikker. Vi modellerer HTML valideringsalgorit- men og DTD specifikationssproget og viser, hvordan valideringsalgoritmen lader sig generalisere til at virke for kontekst-frie grammatikker. Endelig præsenterer vi en ny teknik til at finde s˚arbarheder,som skyldes ma- nipulation af klientens tilstand. Teknikken bruger en kombination af en output analyse og en information flow analyse til at finde flow af data som kan udnyttes af en ondsindet klient. Vi implementerer og evaluerer teknikkerne for at undersøge, hvor godt de virker i praksis. Vi ser, at JWIG er brugbart til at implementere store web applikationer. Vi evaluerer ogs˚aanalyseteknikkerne og ser, at de kan bruges til at finde fejl i open-source programmer, som vi har fundet p˚anettet.
3
Acknowledgments
Thanks to the members of the Programming Languages group at Aarhus University for insightful discussions, inspiring friendships, and countless foosball tournaments. I am truly thankful to Anders Møller for his patient and insightful mentorship. He has been the best supervisor one can imagine. I thank Simon Holm Jensen for his valuable feedback during the writing of this dissertation. I would like to thank Henning Rohde and Anna Gringauze for an inspiring stay at Microsoft Research. Thanks to my parents and brothers for supporting and encouraging me. A very special thanks to my wife Rikke who has been a loving help and support throughout my studies. Our son Andreas has barely been with us for a year but I thank him for his happy smiles and for reminding me to never stop being curious.
Mathias Schwarz, Aarhus, January 29, 2013.
5
Contents
Abstract 1
Resume 3
Acknowledgments 5
Contents 7
I Overview 1
1 Introduction 3 1.1 Overview ...... 4 1.2 Method ...... 5 1.3 Contributions ...... 5 1.4 Software packages ...... 6
2 Web frameworks and web applications 7 2.1 Purpose and structure of web frameworks ...... 8 2.1.1 Components of a web framework ...... 8 2.1.2 Static analysis in the presence of web application frameworks 10 2.2 Software engineering principles in web frameworks ...... 10 2.2.1 Cohesion, coupling, and the MVC pattern ...... 10 2.2.2 Safety by default ...... 11 2.3 Common web application errors ...... 11 2.3.1 Output correctness ...... 11 2.3.2 Security ...... 13 2.4 A survey of web application frameworks ...... 17 2.4.1 PHP ...... 18 2.4.2 Java Servlets ...... 19 2.4.3 Java Server Pages ...... 21 2.4.4 JSF ...... 22 2.4.5 Struts 2 ...... 24 2.4.6 Seaside ...... 26 2.4.7 Hop ...... 28 2.5 The need for a new web framework and analysis for the existing ones 29
3 Designing a new web application framework 31 3.1 Overview of JWIG ...... 31 3.2 Introduction to JWIG application programming ...... 33 3.2.1 Xact syntax ...... 33 3.2.2 Web methods ...... 34 3.2.3 Session data ...... 35
7 3.2.4 Handlers ...... 35 3.2.5 Client/server communication ...... 36 3.3 Relation to existing frameworks ...... 36 3.3.1 Relation to the previous version of JWIG ...... 36 3.3.2 The JWIG approach compared to MVC ...... 37 3.3.3 Relation between submit handlers and Seaside callbacks . . . 38 3.4 Static analysis of JWIG applications ...... 38 3.4.1 XHTML validation of output construction ...... 38 3.4.2 Web application page graphs ...... 39 3.4.3 Link consistency ...... 39 3.4.4 Filter coverage ...... 39 3.4.5 Parameter names ...... 40 3.5 Evaluation of the JWIG web application framework ...... 40 3.5.1 Safety of JWIG applications ...... 41 3.5.2 Case study: CourseAdmin ...... 41
4 Static analysis for existing frameworks 43 4.1 Approximating web application output ...... 43 4.1.1 Output-stream flow graphs ...... 43 4.1.2 From Java Servlets and JSP to output-stream flow graphs . . 44 4.1.3 From output-stream flow graphs to context-free grammars . . 45 4.2 HTML validation in WARLord ...... 45 4.2.1 Annotating context-free grammars with contexts ...... 46 4.2.2 HTML validation of an annotated grammar ...... 46 4.2.3 On HTML5 ...... 48 4.2.4 Comparison to related work ...... 49 4.2.5 Evaluation ...... 50 4.3 Client-state manipulation vulnerability detection ...... 50 4.3.1 Related security analysis techniques for web applications . . . 50 4.3.2 Overview of the analysis technique in WARLord ...... 52 4.3.3 Evaluation ...... 54
5 Conclusion 55
II Publications 57
6 JWIG: Yet Another Framework for Maintainable and Secure Web Applications 59 6.1 Introduction ...... 60 6.2 Architecture ...... 62 6.3 Generating XML Output ...... 64 6.4 XML Producers and Page Updates ...... 65 6.5 Forms and Event Handlers ...... 66 6.6 Example: MicroChat ...... 66 6.7 Parameters and References to Web Methods ...... 67 6.8 Session State and Persistence ...... 68 6.9 Caching and Authentication ...... 69 6.10 Additional Examples ...... 70 6.10.1 QuickPoll ...... 70 6.10.2 GuessingGame ...... 70 6.11 Case Study: CourseAdmin ...... 71 6.12 Conclusion ...... 72
8 7 HTML Validation of Context-Free Languages 79 7.1 Introduction ...... 80 7.1.1 Outline of the Paper ...... 81 7.1.2 Example ...... 81 7.2 Related Work ...... 82 7.3 Parsing HTML Documents ...... 83 7.3.1 A Model of HTML Parsing ...... 84 7.4 Parsing Context-Free Sets of Documents ...... 86 7.4.1 Generating Constraints ...... 86 7.4.2 Solving Constraints ...... 87 7.4.3 Example ...... 89 7.5 Experimental Results ...... 90 7.6 Conclusion ...... 91 7.7 Proof of Theorem 7.1 ...... 93
8 Automated Detection of Client-State Manipulation Vulnerabilities 97 8.1 Introduction ...... 98 8.2 Client-State Manipulation Vulnerabilities ...... 101 8.3 Outline of the Analysis ...... 104 8.4 Identifying Client State ...... 105 8.4.1 Analyzing HTML Output ...... 106 8.4.2 Analyzing Input Parameters ...... 109 8.5 Identifying Shared Application State ...... 109 8.6 Information Flow from Client State to Shared Application State . . 111 8.7 Automatic Configuration of a Security Filter ...... 112 8.8 Evaluation ...... 113 8.8.1 Experiments ...... 114 8.8.2 Summary of Results ...... 120 8.9 Related Work ...... 122 8.10 Conclusion ...... 123
Bibliography 125
9
Part I
Overview
1
Chapter 1
Introduction
The hypothesis in this dissertation is that we can design frameworks and static analyses that aid the programmer to avoid invalid HTML and to avoid programs that are vulnerable against common security vulnerabilities, in particular client- state manipulation vulnerabilities. For such frameworks and analyses to be useful to a programmer it must be possible to apply them to large scale programs. When the web was invented about 20 years ago, it was designed for publish- ing static content. During the following years it evolved into a rich platform for implementing software systems. With simple scripts that generated content, pro- grammers became able to write applications. These applications allowed interac- tions between clients and servers without requiring the client to install additional software. This helped the web to quickly become a popular platform. Today, we call the combination of a client interface and a server that allows interactions and provides data for the client over the HTTP protocol a web application. As sketched in Figure 1.1, web application interactions use the stateless HTTP protocol. Such HTML documents may include forms for server interaction and JavaScript code for implementing rich interfaces. In web applications, the server typically stores state in a database and generates HTML documents depending on this state. Similarly, the client may store some state in JavaScript, as cookies or as client-state parameters (see Section 4.3). As we will discuss in this dissertation, both client and server storage require careful security considerations. With the increased popularity of the web as a platform, web application pro- grammers began developing web application frameworks of reusable components to facilitate faster web application development and better application structure. Starting from simple CGI scripts [72], web application frameworks have today
Figure 1.1: Web interactions follow a stateless request-response protocol but both client and server may store state.
3 4 Chapter 1. Introduction evolved into complex libraries that allow rapid development of structured web ap- plications. This development has resulted in numerous frameworks and techniques for web programming. This dissertation surveys the field of web application frame- works. From this survey, we identify the need to consider techniques to guarantee security and output correctness as central tasks of web application frameworks, and we argue that such techniques are useful for web application programmers. Web application frameworks help the programmer avoid commons errors by reusing solutions to common problems. However, HTML invalidity and security vulnerabilities remain a problem in today’s web applications. With framework de- sign and static application analysis, this situation can be improved. This disserta- tion discusses such framework design and static analyses and argues that it is useful for programmers to employ these techniques while developing web applications. An important observation is that many such errors relate to HTML that is generated by the server and this dissertation investigates techniques that rely on analysis of this output. The dissertation applies two different methods for avoiding errors in applications: 1) It presents a novel web application framework that is designed to be safe by de- fault against common security problem and that allows for easy and precise analysis of HTML validity. 2) It presents analyses for existing web application frameworks. These analyses are evaluated on a set of third-party, open-source applications to study the usefulness of the techniques in practice.
1.1 Overview
The dissertation is composed of three parts. Chapter 2 contains an overview of the area of web application programming, including central concepts, frameworks, and security considerations. Furthermore, it presents a survey of currently popular web application frameworks. Through this survey, we argue that analyses are required to avoid the most common web application program errors, and that a new web application framework should be designed to avoid these errors. Based on Chapter 2, Chapter 3 presents the design of the JWIG web application framework that is further discussed in the paper that is included as Chapter 6 of this dissertation. The JWIG framework is designed for writing web applications that are correct, secure, maintainable, and Chapter 3 will discuss how the JWIG frame- work solves the common problems identified for existing frameworks. The JWIG framework is a thoroughly redesigned successor to an older, namesake framework and we will also briefly study the conceptual differences between the two. Chapter 4 presents static analyses for existing frameworks. These analyses allow reasoning about security and correctness for web applications written using existing frameworks. This work includes output validation for web applications as well as a security analysis to guard against client-state manipulation attacks. The reader of this dissertation is assumed to be familiar with the areas of static program analysis, context-free grammars, and regular languages as well as with with the concepts of Java, HTML, XML, and HTTP. Part II contains the publications I have co-authored as part of my PhD stud- ies. Specifically, the following papers were co-authored as part of my PhD studies at Aarhus University. They are all submitted in extended form along with this dissertation: JWIG: Yet Another Framework for Maintainable and Secure Web Applications with Anders Møller. Appeared in Proc. 5th International Conference on Web Information Systems and Technologies, March 2009 HTML Validation of Context-Free Languages with Anders Møller. Appeared in Proc. 14th International Conference on Foundations of Software Science and Computation Structures, 2011 1.2. Method 5
Automated Detection of Client-State Manipulation Vulnerabilities with Anders Møller. Appeared in Proc. 34th International Conference on Software Engi- neering, 2012
1.2 Method
This section describes the methods that have been applied to investigate the hy- pothesis.
Implementation The techniques that are described in this dissertation have all been implemented as software packages (see Section 1.4). Implementation of web framework analysis requires specialization towards individual frameworks since analysis of frameworks themselves is likely to yield an imprecise result (Section 2.1.2). In our implementation work, we have decided to focus on the Java program- ming language and the Servlet, JSP, and Struts web application frameworks. It is assumed that the result of analyzing framework based Java web applications is rep- resentative for the result that would arise from analyzing similar applications and frameworks in other languages. Section 2.4 will give a more detailed description of the structure of the web application frameworks that our tool can handle and compare them to widely used web application frameworks. The JWIG web application framework is likewise implemented in Java and demonstrates an extension to the Java programming language that has useful prop- erties for web application programmers. A similar extension could be created for other programming languages.
Experimental evaluation We have evaluated the analysis techniques by applying the software to open-source benchmarks. We have evaluated the techniques based on two criteria: 1) The precision of the analysis. In particular, how many false positives arise when running the tool on the benchmarks. We determine the number of false positives through manual inspection of the warnings given by the tool. 2) The usefulness to the programmer. In particular, how well does the technique guide the programmer towards correcting the issue in the code. We evaluate this property by discussing the process needed to asses the warnings given by the tool. In the case of JWIG (see Chapter 3) we have implemented a software system CourseAdmin using the framework. CourseAdmin serves as a case study for evalu- ating JWIG in relation to the design goals of the JWIG web framework. The experimental evaluation is also useful for identifying the need for future improvements of the evaluated techniques. The experimental evaluations will be described in further detail later in individual sections for each of the techniques.
1.3 Contributions
This dissertation presents the following main contributions:
• We survey the most influential web application frameworks and evaluate them in terms of output correctness, security properties and well-established soft- ware engineering principles.
• We identify the need and present the design of a new web application frame- work, JWIG, that avoids problems that are common to the existing frame- 6 Chapter 1. Introduction
works. The framework is evaluated through a comparison with existing frame- works and by implementing a large web application, CourseAdmin. • We present and evaluate a novel algorithm for validating dynamically gener- ated HTML pages. The algorithm is applicable to all SGML and XML based languages that are described by DTD schemas. The algorithm is further- more able to include more SGML features in the validity check compared to previously available methods. These features include SGML content model exceptions and optional start tags. The usefulness of the approach is evaluated on a set of benchmark applications. • We present and evaluate a technique for detecting security vulnerabilities that are related to client-state manipulation. Client-state manipulation vulnera- bilities allow a malicious client to change data that the server stores as part of the document on the client side. The usefulness of the approach is evaluated on a set of benchmark applications.
1.4 Software packages
I have worked on a number of software packages as part of my PhD studies. First and foremost, these packages serve to evaluate the usefulness of the techniques described in my papers. In connection with my publications I have worked on the following software packages:
• JWIG12 - A framework for writing Java web applications. This framework is further discussed in Chapter 3. • CourseAdmin - A course administration tool that serves as a large-scale benchmark of JWIG. This tool currently serves as the course administration tool used for most courses at the Department of Computer Science. • WARLord3 - A tool for reasoning about Java web application response out- put and information flow in web applications. The tool implements the HTML validity analysis and the client-state manipulation analysis with a shared front end. These analyses are discussed in Chapter 4.
1http://www.brics.dk/JWIG/ 2The JWIG analysis suite was implemented by Esben Andreasen. 3http://www.brics.dk/WARLord Chapter 2
Web frameworks and web applications
Web application frameworks fall into two major categories: server-based frameworks concerned with programming the server side of web applications and client-based frameworks concerned with programming browsers. Client-based frameworks en- able the programmer to write applications with rich and highly interactive user interfaces while server-based frameworks allow the application to run on a machine that is controlled by the application provider. This makes it possible for the pro- grammer to implement his program in the language of his choice and allows him to draw on mature and well-known techniques and frameworks for implementing his application.
Some will use web applications in ways not anticipated by the programmer and some with malicious intent. Some will use browsers different from the one used by the developers of the applications. This results in challenges for web application programmers, and frameworks are useful to solve many such challenges.
Recently, there has been an increased focus on client-based web frameworks. The lack of type safety in the JavaScript programming language has motivated tools like TAJS [34] that assist the programmer in writing better client side programs. In spite of new client-based technologies, server-based frameworks remain widely used and it remains an important goal to improve stability and correctness of server-based web applications. Many of the problems that are common to server side web applications remain unsolved. The focus of this dissertation is on such server-based applications and in the following sections ”web application framework” refers to server-based web application frameworks unless something else is explicitly stated.
Much work has been done to improve the quality of web applications. Testing and verification techniques that are applicable to programs in general, are also use- ful for web application programmers. Web applications, however, share common traits that make it possible to create specialized analysis techniques useful to ana- lyze any web application. In this chapter, we will survey the area of web application frameworks. This will serve as a basis for understanding the applicability of the so- lutions presented later in this dissertation. Through this survey, we will identify the need for solutions to problems that are present across the current web application frameworks. In the following chapters, we will discuss solutions for these problems, both solutions that avoid the problems at the framework level and solutions through static analysis.
7 8 Chapter 2. Web frameworks and web applications
2.1 Purpose and structure of web frameworks
As discussed above, web application frameworks makes it possible for programmers to reuse commons implementations of some of the tasks and to abstract away lower- level details of the interaction with the client. In this section we will identify the core components of server-based web application frameworks and survey a portion of the most widely used web frameworks based on their design of these components.
2.1.1 Components of a web framework Web application frameworks differ highly on their levels of abstraction and the amount of features they make available to the programmer. Most frameworks con- tain myriads of features, some of them general purpose, some of them specialized towards specific application architectures encouraged by the specific framework. Four basic components are, however, made available by all web application frame- works, a dispatcher, a decoder, a generator, and a store. This section presents an overview of these components and the design space for each of them. In Section 2.4 we will survey web application frameworks to see examples of each of the points in the design space.
The dispatcher A dispatcher defines the relationship between HTTP requests and web application code by locating and invoking code based on the contents of HTTP requests. A dispatcher can be explicitly configured through a configuration file or it may be implicitly configured through conventions. In the latter case, conventions typically determine how the dispatcher generates mappings from URLs to code based on the class or file structure of the program code. The design of the dispatcher defines what the basic unit of the web application framework is. A basic unit corresponds to a single, possible entry point that can be invoked by the dispatcher. Basic units can be source files in which the dispatcher will typically start executing the code from the beginning of the file (see for example PHP and JSP in Section 2.4). It can be classes where the dispatcher can invoke a method based on a predefined interface (see for example Servlets or Seaside). It could also be individual functions as it is the case for Hop and JWIG. We will refer to an instance of such basic unit as a page in the application.
The decoder A decoder decodes requests from clients and provides means for the web application to read parameters, headers, and request body data sent as part of the request. We can categorize decoders into two types: 1) Pull decoders that provide an interface to the decoder itself. The programmer retrieves the request parameters by invoking methods on this interface. A map from names to parameter values is the simplest form of such a pull decoder. Examples of such decoders exist in PHP, JSP, and the Servlet framework. 2) Push decoders that inject the request values into method parameters or Java Bean properties before the dispatcher invokes the entry point. The choice of pa- rameters or properties typically depends on the choice of basic unit for the decoder: parameters are used if the basic unit is functions and Java Bean properties are used otherwise. Examples of frameworks that use push decoders include JSF, Struts, Hop, Seaside, and JWIG. The push decoder approach makes it simple to identify the interface of a web application, while the pull approach provides the programmer with flexibility. Most of the push decoder frameworks also provide means for the programmer to read 2.1. Purpose and structure of web frameworks 9 parameters in pull style for situations where the added flexibility is necessary. Of the pull decoder frameworks discussed later in the chapter, only Hop and Seaside are purely push style and provide no pull features.
The generator The generator constructs output that is returned to clients as response to requests to the server. The generator has a large variety of design features which we will try to categorize here. The web framework may provide a domain specific language (DSL) for writing templates intermixed with program code or it may rely on general purpose language syntax only. Template languages typically permit the programmer to write HTML code fragments in exactly the same syntax as is used for writing static HTML documents and they provide some way to combine these templates into a document. Frameworks without templates typically allow the programmer to generate output through function calls that have side effects in the generator. Templates are used in the PHP, Servlets, JSP, JSF, Struts, and JWIG frame- works while the Servlets, Seaside and Hop frameworks rely on the syntax of Java, Smalltalk and a Scheme-like language respectively. Furthermore, JSP and Struts allow the programmer to intermix templates and calls that have side effect in the generator. Orthogonally to the inclusion of template syntax, the output might be repre- sented as a first class value in the programming language or be implicitly repre- sented by the runtime system, for example as a result of appending data to an output stream read by the client. While first class values offer a high degree of freedom to the programmer, the stream approach allows data to be retrieved and rendered by the client while the server side is still executing. A further design choice for the output representation is whether the values are mutable or immutable. On the client side, the document is represented as a first class, mutable structure, the DOM [30]. There are only few examples of such a representation in server side web application frameworks. Orthogonally to these types, the generator may offer various degrees of pro- tection against cross-site scripting attacks. We will get back to this issue in Sec- tion 2.3.2.
The store The store holds inter-request data. The store is often separated into various scopes, such as an application scope where the data is shared between all requests and clients, a session scope where the data is shared among requests from the same client, and a request scope where the data is shared only for the current request. Frameworks may provide fewer or more scopes than this or they may provide only the scope features of the hosting programming language. The store may require separation from the rest of the code, so that the structure of the store must be represented as classes that are separate from the code that interacts with the generator. It might also allow integration so that the programmer can store data as part of the same code that interacts with the generator. Finally, the store may be typed so that the type of a value is guaranteed by the type system of the programming language or it may be untyped and leave it up to the programmer to ensure type correctness. In Java the values may be represented as properties of Java Beans to have a typed store or represented as a string-to-object maps to have an untyped store. In the latter case, the programmer must cast the value to the expected type. Some frameworks, such as JSP, offer both a typed and an untyped storage. 10 Chapter 2. Web frameworks and web applications
2.1.2 Static analysis in the presence of web application frameworks The large variety in web application frameworks and the differences in components pose a challenge when creating static analysis tool for web applications. Web ap- plication frameworks often rely on highly reflective code that is hard to analyze precisely. Therefore, tools are typically limited in scope to a single or a few web ap- plication frameworks for which specialized analysis support has been implemented in the tool. This is also the case for the WARLord tool discussed later in this dissertation. So far, only little work has been done to overcome this challenge. Recently, Sridharan et al. presented the F4F (Framework for Frameworks) [78] system in which they are able to describe framework related data flow through a specification language WAFL. While they do not identify the components as such, they specify data flow from the decoder to the program code and through the framework store. Furthermore, the dispatcher is handled in enough detail to model entry points for an analysis. From the WAFL specification, the tool generates easily analyzable code for framework data flow and inserts the code into the Java byte code of the program. This allows the tool to analyze the program much more precisely compared to the result of analyzing the framework code. They demonstrate how they are able to implement a high precision taint analysis that is applicable to all frameworks supported by F4F and they demonstrate how Servlets, JSP, Spring MVC, and Struts can be modeled in the WAFL language. So far, the tool only supports Java but the approach seems applicable to other languages as well. It might be possible to generalize the ideas of F4F to also include the genera- tor and to generalize F4F to support additional frameworks and languages. The supported frameworks are very similar, and it is remains to be seen whether code generation is sufficient to handle frameworks that are substantially different from the JSP/Servlet family.
2.2 Software engineering principles in web frameworks
Web application developers must know a myriad of design patterns and architecture patterns to write and to understand modern web applications. In many cases such patterns are integral parts of web application frameworks. This section is not meant as a general introduction to software engineering best practices or to architecture principles in general; many good books already exist on this topic. Rather, in this section we will briefly discuss these principles at the level needed to understand the rest of this dissertation.
2.2.1 Cohesion, coupling, and the MVC pattern The notions of high cohesion and low coupling describe applications that are struc- tured into multiple loosely coupled components [79]. In lowly coupled applications, changes can be made locally without affecting other parts of the application. Low coupling improves maintainability of the code. Dually, in highly cohesive systems, code that is closely related in functionality is also closely related in the code. High cohesion improves readability of the code. High cohesion and low coupling can be properties of the overall architecture as well as the implementation and structure of the application code. Several architectural patterns promote high cohesion and low coupling. In par- ticular the model-view-controller (MVC) pattern [8,43,74] has gained popularity in web applications frameworks [46]. In this pattern, the use of the generator - the view - is separated from the model that represents the values in the store and the 2.3. Common web application errors 11 controller that updates the model and view upon client interaction. Many good sources explain the MVC pattern and the reader is assumed to be familiar with the pattern. In Chapter 3 we will discuss how a web framework can guide programmers towards high cohesion and low coupling in web applications without requiring the applications to adhere to a strict and specific implementation of the model-view- controller architecture pattern.
2.2.2 Safety by default Safety by default is the principle that the framework rather than the application itself should guard against program errors and security vulnerabilities. In such a system, creating an application that has a certain type of error or vulnerability requires explicit action from the programmer. For example, in statically typed pro- gramming languages it is, assuming that the type system is sound, impossible to unawarely write a program that results in a type error at runtime. Most such lan- guages do, however, provide specific features that allow the programmer to disable this safety. In the case of Java, one such feature is type casting which serves to override the type of an expression in the type checker. Similarly, web application frameworks may guard against common types of er- rors and vulnerabilities by exposing an API that is designed to make it impossible to write an incorrect or vulnerable program. Applied at the web application frame- work level, the principle of safe by default can address common problems in web applications such as cross-site scripting and HTML invalidity. We will discuss com- mon errors in web frameworks in Section 2.3 and see examples of safe by default design later.
2.3 Common web application errors
Two groups of errors relate directly to client interactions and are therefore of par- ticular interest to web application developers: Output correctness and security vul- nerabilities. Valid web application output guarantees to the programmer that all standards-compliant web browsers will render the documents consistently. Security vulnerabilities may allow malicious clients to attack the system and gain unintended privileges or knowledge. In this section, we will discuss these two groups of errors in more detail. The solutions to web application errors fall into three categories: 1) static so- lutions where the program is analyzed by a tool prior running the program, 2) dynamic solutions where the error is detected and prevented at runtime, and 3) framework solutions where the language or the framework is constructed in such a way that it is impossible to unawarely write a vulnerable program. That is, the framework is safe by default.
2.3.1 Output correctness When clients interact with web applications, clients expect the server to communi- cate using some the established web standards. The HTML family of languages [45] up to and including HTML 4 are specified using the DTD language. The XML ver- sion of XHTML is described in a similar specification language XML DTD [52]. In this dissertation the word HTML does not refer to XHTML unless explicitly stated. A DTD description allows a validator to determine whether an HTML document is valid. We call an HTML document valid if it conforms to one of the HTML DTD specifications. Validity can be verified for a single HTML document by running an off-the-shelf validator that compares the document to a DTD. 12 Chapter 2. Web frameworks and web applications
1 import javax.servlet.http.*; 2 import java.io.IOException; 3 import java.io.PrintWriter; 4 5 public class InvalidHTML extends HttpServlet { 6 7 public void doGet(HttpServletRequest request, 8 HttpServletResponse response) 9 throws IOException { 10 PrintWriter writer = response.getWriter(); 11 writer.write("
Figure 2.1: A Java servlet that outputs invalid HTML in one case.
Since the HTML specifications only prescribe the meaning of valid documents, invalid HTML documents are often rendered differently, depending on which browser is used. For this reason, careful HTML document authors validate their documents, for example using the validation tool provided by W3C1. Chen et al. surveyed a large set of web pages and showed that many existing HTML documents on the web are invalid [9] so validity is an important concern for programmers. Web applications generate HTML dynamically, and while the W3C validator can validate individual documents, it cannot guarantee that all output that is generated by a web application is valid. Other methods are therefore needed to verify output correctness for web applications. Figure 2.1 shows an example of a Java Servlet that will generate invalid HTML if the value of the input parameter name is the string "World". The extra